Sequencing and Raw Sequence Data Quality Control    ◾    47

7. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM: The Sanger FASTQ file format for sequences

with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 2010,

38(6):1767–1771.

8. FASTQ Files [https://support.illumina.com/help/BaseSpace_OLH_009008/Content/Source/

Informatics/BS/FASTQFiles_Intro_swBS.htm]

9. Leinonen R, Sugawara H, Shumway M: The sequence read archive. Nucleic Acids Res 2011,

39(Database issue):D19–21.

10. Andrews S: FastQC: A Quality Control Tool for High Throughput Sequence Data. Babraham

Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.

11. Chen Y-C, Liu T, Yu C-H, Chiang T-Y, Hwang C-C: Effects of GC bias in next-generation-

sequencing data on de novo genome assembly. PLOS One 2013, 8(4):e62856.

12. Lightfield J, Fram NR, Ely B: Across bacterial phyla, distantly-related genomes with similar

genomic GC content have similar patterns of amino acid usage. PLoS One 2011, 6(3):e17677.

13. Romiguier J, Ranwez V, Douzery EJ, Galtier N: Contrasting GC-content dynamics across 33

mammalian genomes: relationship with life-history traits and chromosome sizes. Genome

Res 2010, 20(8):1001–1009.

14. FASTX-toolkit [http://hannonlab.cshl.edu/fastx_toolkit/]

15. Bolger AM, Lohse M, Usadel B: Trimmomatic: A flexible trimmer for Illumina sequence data.

Bioinformatics 2014, 30(15):2114–2120.

16. Chen S, Zhou Y, Chen Y, Gu J: fastp: An ultra-fast all-in-one FASTQ preprocessor.

Bioinformatics 2018, 34(17):i884–i890.